XploR logo

XploR is an R package designed for robust, allelic imbalance and large-scale copy number analysis from whole exome sequencing (WES) data in clinical genomics. It features advanced noise reduction using a panel of normal samples for both coverage and allelic counts, comprehensive smoothing and segmentation algorithms, and accurate purity and ploidy estimation. XploR supports flexible rerun options based on chromosome region, tumor purity, or diploid coverage, and includes integrated ISCN annotation and visualization. These capabilities make XploR a powerful solution for clinical and research applications in genomic copy number analysis.


Contents


Features

Installation

Install the latest version from GitHub using devtools:

install.packages("devtools")
devtools::install_github("sj-cmpb-se/XploR")

Quick Test run

All files needed for a test run in placed at inst/extdata folder. RunExamplePipeline() will use the files in inst/exdata for a test run. Panel of normal generation is not included in the test run. Details for build a panel of normals please refer to Prepare reference files

library(XploR)
RunExamplePipeline( out_dir = "/path_to_output_dir" )

Running this function is same with running the steps separately like:

1. Run segmentation based on Allelic imbalance information. The example used “cbs” segmentation method.

RunAIsegmentation(
    seg = seg,
    cov = cov,
    ai = ai,
    gender = gender,
    out_dir = out_dir,
    prefix = prefix,
    ai_pon = ai_pon,
    aitype = "dragen"
  )
Parameters for RunAIsegmentation
Parameter Type Description Example Value
seg character Path to the GATK segment file. "sample.seg"
cov character Path to the GATK denoised coverage count file. "sample.counts"
ai character Path to the BAF file or allelic count file. "sample.baf"
ai_pon character Path to PON Rdata. AI panel of normals generated by PONAIprocess. "PON_AI.Rdata"
gender character Sample gender ("female" or "male"), passed to ReadAI(). "female"
out_dir character Output directory path. "results/"
prefix character Output file prefix. "Sample1"
mergeai numeric MAF difference threshold for merging segments under “merge” segmentation mode (default: 0.15). 0.15
mergecov numeric CNV difference threshold for merging segments (default: 0.2). 0.2
snpmin numeric Minimum SNPs for MAF segmentation under “merge” segmentation mode (default: 7). 7
minsnpcov numeric Minimum coverage of SNPs to be included (default: 20). 20
maxgap numeric Maximum gap size inside a bin; if exceeded, start a new bin (default: 1,000,000). 1000000
snpnum integer SNP number in each bin (default: 30). 30
maxbinsize numeric Maximum bin size (default: 5,000,000). 5000000
minbinsize numeric Minimum bin size (default: 500,000). 500000
minsnpcallaicutoff numeric Minimum SNPs for reliable CNLOH/GAINLOH (default: 10). 10
mergecovminsize numeric Minimum size for GATK segment merge (default: 500,000). 500000
segmethod character Segmentation method: "merge" for stepwise merging, "cbs" for CBS segmentation. "cbs"
cbssmooth character If using CBS, "yes" to apply smoothing before segmentation, "no" to skip smoothing. "yes"
aitype character Type of allelic imbalance data: "gatk", "other", or "dragen" (see below for requirements). "dragen"

Note on aitype column requirements: - If "gatk" or "other": input must include columns CONTIG, POSITION, ALT_COUNT, REF_COUNT, REF_NUCLEOTIDE, and ALT_NUCLEOTIDE. - If "dragen": input must include columns contig, start, stop, refAllele, allele1, allele2, allele1Count, allele2Count, allele1AF, and allele2AF.

2. Run model likelihood calculation and selection.

RunModelLikelihood(
    seg = paste0(out_dir,"/",prefix,"_GATK_AI_segment.tsv"),
    out_dir = out_dir,
    prefix = prefix,
    gender = gender,
    modelminprobes = 20,
    modelminAIsize = 5000000,
    minsf = 0.4,
    callcov = 0.3,
    thread = 6)
Parameters for RunModelLikelihood
Parameter Type Description Example Value
seg character Path to the combined segment file (e.g., output from segmentation step above "results/Sample1_GATK_AI_segment.tsv"
out_dir character Output directory for results "results/"
prefix character Prefix for output files "Sample1"
gender character Sample gender ("male" or "female") "female"
modelminprobes integer Minimum number of probes/SNPs per segment to include in modeling 20
modelminAIsize numeric Minimum segment size (bp) to include in modeling 5000000
minsf numeric Minimum scale factor to consider in model selection 0.4
callcov numeric Subclonal events calling cutoff based on total copy number 0.3
thread integer Number of CPU threads to use for parallel processing 6
callcovcutoff numeric (Optional) Threshold for calling without modeling. 0.3
callaicutoff numeric (Optional) Threshold for calling without modeling. 0.3
minsnpcallaicutoff integer (Optional) Minimum SNPs to call AI segment 10

Notes:

3. Run annotation segments.

AnnotateSegments(
    input = paste0(out_dir,"/",prefix,"_final_calls.tsv"),
    out_dir = out_dir,
    prefix = prefix,
    cytoband = cytoband,
    whitelist_edge = whitelist_edge,
    gene = gene)
Parameters for AnnotateSegments
Parameter Type Description Example Value
input character Path to XploR CNV calling output. "results/Sample1_final_calls.tsv"
out_dir character Output directory for results "results/"
prefix character Prefix for output files "Sample1"
cytoband character Path to cytoband annotation file (TSV). See Prepare input for detail. "data/cytoBand.txt"
whitelist_edge character Path to detectable edge for each chromosomes.See Prepare input for detail. "data/whitelist.txt"
gene character Path to gene annotation file. See Prepare input for detail. "data/gene_anno.txt"

4. Generating CNV plot

RunPlotCNV(
    seg = paste0(out_dir,"/",prefix,"_CNV_annotation.tsv"),
    cr =cr,
    ballele = ai,
    ai_binsize = 100000,
    cov_binsize = 100000,
    whitelist = whitelist_bed,
    gender = gender,
    out_dir = out_dir,
    prefix = prefix,
    aitype = "dragen"
  )
Parameters for RunPlotCNV
Parameter Type Description Example Value
seg character Path to final annotated call file. "results/Sample1_CNV_annotation.tsv"
cr character Path to the GATK denoised copy ratio file with extension .denoisedCR.tsv "data/sample.denoisedCR.tsv"
ballele character Path to the B-allele file (from DRAGEN, GATK, or other source). See aitype for required columns. "data/sample.tumor.baf.gz"
ai_binsize numeric Bin size for AI plot (default: 100,000) 100000
cov_binsize numeric Bin size for coverage plot (default: 100,000) 100000
whitelist character Path to whitelist file for regions to include "data/whitelist.txt"
gender character Sample gender ("male" or "female") "female"
out_dir character Output directory for plot "results/"
prefix character Sample ID or output prefix "Sample1"
aitype character Type of allelic imbalance data: "gatk", "dragen", or "other". "dragen"

5. Generating AI segment quality file.

BafQC(
    annofile = paste0(out_dir,"/",prefix,"_CNV_annotation.tsv"),
    out_dir = out_dir,
    prefix = prefix)
Parameters for BafQC
Parameter Type Description Example Value
annofile character Path to the CNV annotation file (e.g., *_CNV_annotation.tsv) "results/Sample1_CNV_annotation.tsv"
out_dir character Output directory for the QC summary file "results/"
prefix character Prefix for the QC output file "Sample1"

Prepare input files

  1. Run GATK in tumor-only mode by default parameters. Below is a summary of the GATK tumor-only mode command used in our pipeline. Please see the GATK website for details. Files will be used in XploR is sample.counts, sample.called.seg, sample.allelic_counts and sample.denoisedCR.tsv.
  2. The allelic count file also could generate by other software like DRAGEN or samtools.
Supporting allelic count file format
aitype parameter value software minimum columns File extention
dragen Illumina DRAGEN contig, start, refAllele, allele2, allele1Count,allele2Count "sample..tumor.ballele.counts.gz"
gatk GATK CONTIG, POSITION, ALT_COUNT, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE "sample.allelic_counts"
other Other (e.g. samtools) CONTIG, POSITION, ALT_COUNT, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE ""

Prepare Reference Files

Panel of normal reference

A Panel of Normals (PON) is required and should be generated using GATK, DRAGEN, or any other software capable of producing allelic count files.

Note: Male and female PON files need to be generated separately.

A. Whitelist, Blacklist, and Detectable Boundary Files

These files are generated from the PON HD5 file (from GATK), a cytoband file, and gender information. They are essential for downstream processing and include:

These files are created based on the GATK Panel of Normals.
See the function documentation in R: ?PonProcess or help("PonProcess", package = "XploR").

Example usage:

PonProcess(
  pon_file = pon_hdh5_file,
  blacklist_bed = output_blacklist_bed,
  whitelist_bed = output_whitelist_bed,
  cytoband = cytoband,
  detectable_edge = output_detectable_edge,
  gender = gender
)

B.Panel of Normals Based on Allelic Count Files

The ai_pon_file should be a text file listing the paths to normal allelic count files generated by GATK, DRAGEN, or other software.

You can process these files to generate the PON reference for allelic imbalance using:

PONAIprocess(
  ai_pon_file = ai_pon_file,
  aitype = "GATK",
  minsnpcov = 20,
  output = "/Pathtoresults",
  prefix = "PONAI",
  maxgap = 2000000,
  maxbinsize = 5000000,
  minbinsize = 500000,
  snpnum = 30,
  gender = "female"
)
Parameters for PONAIprocess
Parameter Type Description Example Value
ai_pon_file character Path to a text file listing PoN AI file paths (one per line) "pon_ai_file_list.txt"
aitype character Type of AI input file ("gatk", "dragen", or "other"), passed to ReadPonAI() "gatk"
minsnpcov integer Minimum SNP coverage to include a site in the AI calculation 20
maxgap numeric Maximum allowed gap between SNPs within a bin (in base pairs) 1000000
maxbinsize numeric Maximum allowed bin size (in base pairs) 5000000
minbinsize numeric Minimum allowed bin size (in base pairs) 500000
snpnum integer Target number of SNPs per bin 30
output character Output directory for the processed PoN AI Rdata file "results/"
prefix character Prefix for the output file "PON"

Gene annotation reference:

Gene annotation can be obtained from various sources (e.g., Ensembl, UCSC, Gencode, RefSeq). An example file is included with the package:

gene <- system.file("extdata", "RefSeqCurated.genePred.gene_region.txt", package = "XploR")
head(read.table(gene, header = TRUE, sep = "\t"))

Cytoband annotation reference:

Cytoband annotation files are typically downloaded from UCSC. An example file is included:

cytoband <- system.file("extdata", "hg19_cytoBand.dat", package = "XploR")
head(read.table(cytoband, header = TRUE, sep = "\t"))

Altorithm

Binning Strategy for allelic count Data

The BinMaf function implements a flexible binning strategy for minor allele frequency (MAF) data, supporting both tumor samples and panels of normal (PoN) samples. The binning can be performed using either a fixed number of SNPs per bin with additional criteria to handle genomic gaps and bin size limits. Within each bin, Gaussian mixture modeling (GMM) is applied to identify clusters in the MAF distribution.

Key features:

This strategy ensures that bins are of consistent size and SNP content, while avoiding the inclusion of widely separated SNPs in the same bin, and is robust for both tumor and normal samples.

Segmentation of MAF track

In addition to CBS (Circular Binary Segmentation), our pipeline supports a “merge” mode for segmentation based on minor allele frequency (MAF) values. While CBS is the default and recommended strategy, “merge” mode offers a step-wise, rule-based approach to combine adjacent MAF segments.

Step-wise Merging Strategy:

Note:

MAF Bias Correction Using Panel of Normal (PoN) Allelic Counts

To correct for systemetic MAF bias estimates, we use a panel of normal (PoN) allelic count files as a reference. For each segment in the tumor or sample of interest, we compare the segment’s MAF to the distribution of MAF values observed in the PoN for the same genomic region. This process ensures that technical or locus-specific biases in MAF are removed only when the tumor segment does not show significant deviation from the normal reference.

Correction process: - For each segment, identify all overlapping PoN segments and extract their MAF values. - If the segment’s MAF is not significantly different from the PoN MAF distribution (assessed via a Wilcoxon test or a small absolute difference), apply a logit-based centering correction: - The segment MAF is transformed to the logit scale, centered by the PoN median MAF, and then inverse-logit transformed back and capped at 0.5. - If the segment’s MAF is significantly different from the PoN, the original segment MAF is retained (no correction is applied), thus preserving true biological signal. - This approach ensures that only technical or systematic biases are corrected, while real allelic imbalance events in the tumor are preserved.

This method uses the panel of normals as an adaptive reference, providing robust bias correction without shrinking true tumor signals.

Purity and diploid coverage scale factor estimation

Estimate a Beta–Binomial Over-Dispersion Parameter \(\theta\) from a Panel of Normals (PoN)

To accurately model over-dispersion in minor allele frequency (MAF) data, we estimate a beta-binomial dispersion parameter (\(\theta\)) using a panel of normal (PoN) samples. This allows us to account for extra-binomial variation and improves the likelihood calculation for each segment.

For each bin \(b\) and depth stratum:

Within each depth stratum, we take a robust center (median) of \(\widehat{\theta}_b\) to obtain \(\theta\) for that stratum.

where:

Prior assignment based on parsimony principle

Priors are assigned to each potential copy number combination based on the principle of parsimony, which favors simpler (biologically less complex) allele configurations. The biological difficulty level reflects the number of steps required to reach a given allele combination from the baseline diploid state (1,1), where each step represents either a gain or loss of one allele.

Tumor Copy Number Estimation

For each genomic segment, the model computes a range of possible tumor-specific copy numbers (CN_tumor) that could result from observed data under different cancer cell fractions (ccf):

\[ CN_{tumor} = \frac{C_i \times 2 / (\mu \times 100) - (1 - \rho) \times 2 - \rho \times (1 - ccf) \times 2}{\rho \times ccf} \]

where:

Likelihood Calculation

\[ \mathrm{Beta}(\alpha, \beta) \]

where:

For each segment, \(K\) is calculated as:

\[ K = \frac{\text{depth}}{1 + (\text{depth} - 1) \cdot \theta} - 1 \]

where:

\[ \text{Posterior Likelihood} = \text{MAF Likelihood} \times (\text{Prior})^\gamma \]

where:

Assigning Calls for Each Segment

The SelectCallpersegment() function refines and selects the most likely allele combinations for each genomic segment, handling both clonal and subclonal events, and incorporates coverage differences and prior knowledge.

Output

Segmentation output

sample_GATK_AI_segment.tsv ( Generared by ?RunAIsegmentation function)

Column Type Description Example_value
Sample character Sample identifier Sample1
Chromosome character Chromosome name 1
Start integer Start position (base pair) 123456
End integer End position (base pair) 234567
Num_Probes integer Number of probes/SNPs in the segment 25
Segment_Mean numeric Segment mean (log2 ratio) from CNV analysis 0.42
gatk_SM_raw numeric Raw segment mean from GATK 0.38
gatk_count integer Number of counts in GATK segment 30
gatk_baselinecov numeric The GATK baseline is an intermediate value calculated using gatk_SM_raw and gatk_count. 100.5
gatk_gender character Gender as reported by GATK female
pipeline_gender character Gender as used in pipeline female
MAF numeric Minor allele frequency for the segment 0.21
MAF_Probes integer Number of probes used to calculate MAF 18
MAF_gmm_G integer Number of GMM clusters in MAF distribution 2
MAF_gmm_weight numeric Mixture weight of the main GMM cluster 0.85
size integer Segment size in base pairs 111111
BreakpointSource character Source of breakpoint (GATK or Postprocess) GATK
FILTER character Quality tag for the segment (PASS or FAILED) PASS

Raw likelihood results under each configuration

sample_likelihood_raw.tsv (Generated by ?RunModelLikelihood() function)

Column Type Description Example_value
major integer Major allele copy number 2
minor integer Minor allele copy number 1
CN integer Total copy number (major + minor) 3
ccf numeric Cancer cell fraction 0.85
Bio_diff integer Biological difficulty score for the allele combination 3
prior numeric Prior probability for the allele combination 0.12
expected_maf numeric Expected minor allele frequency for this configuration 0.21
maf_ll numeric Log-likelihood for the observed MAF under this configuration -0.56
weighted_prior numeric Weighted log-prior (prior × gamma) -2.13
exp_maf_ll numeric Exponentiated MAF log-likelihood 0.57
exp_prior numeric Exponentiated weighted prior 0.11
MAF_likelihood numeric Posterior likelihood for this configuration 0.065
Segcov numeric Pseudo Segment coverage 280
MAF numeric Observed minor allele frequency 0.19
mu numeric Diploid coverage scale factor 1.0
rho numeric Tumor purity (fraction between 0 and 1) 0.7
index character Segment index or identifier "12"
Tag character Segment inclusion/exclusion tag for summarizing total likelihood for a model (e.g., "Include", "Exclude") "Include"
ccf_MAF numeric Cancer cell fraction estimated from MAF and allele configuration only 0.81

Allelic combiantion ressult with maximum likelihood under each configuration

sample_top_likelihood_calls.tsv ( Generated by ?SelectCallpersegment() function ) The format is simillar with sample_likelihood_raw.tsv, with best allelic combiantion is selected for each segment under each diploid coverage scale factor and tumor purity configuration.

Likelihood for each combination of diploid coverage scale factor and tumor purity

sample_Models_likelihood.tsv ( Generated by ?SelectFinalModel() function )

Column Type Description Example_value
mu numeric Diploid coverage scale factor (model parameter) 1.0
rho numeric Tumor purity (model parameter, fraction between 0 and 1) 0.7
total_log_likelihood_before_refine numeric Total log-likelihood for the model before refinement -1234.5
segments_n integer Number of segments included in the model 27
Likelihood_penalty_rows integer Number of segments penalized due to failed likelihood calculation 2
total_log_likelihood_after_refine numeric Total log-likelihood for the model after refinement -1220.2
diploid_n integer Number of diploid segments in the model 15
diploid_distance_to_integer numeric Mean distance to integer copy number for diploid segments 0.04
nondiploid_n integer Number of non-diploid segments in the model 12
nondiploid_distance_to_integer numeric Mean distance to integer copy number for non-diploid segments 0.11
total_distance_to_integer numeric Sum of diploid and non-diploid mean distances to integer copy number 0.15
ploidy numeric Mean copy number (ploidy) across all segments 2.4
Tier1 character Model tier label (e.g., "Tier1_Models", "Final_model_MAF") "Tier1_Models"
total_likelihood_cluster integer Rank based on total likelihood ( lower is better ) 1
diploid_distance_cluster integer Rank based on diploid distance to integer copy number ( lower is better ) 1
nondiploid_distance_cluster integer Rank based on non-diploid distance to integer copy number (lower is better) 1
total_likelihood_cluster_mean numeric Mean total log-likelihood for the level -1200.0
diploid_distance_cluster_mean numeric Mean diploid distance to integer for the level 0.03
nondiploid_distance_cluster_mean numeric Mean non-diploid distance to integer for the level 0.10

Final output of CNV calling

sample_final_calls.tsv (Generated by ?RunModelLikelihood() function)

Column Type Description Example_value
Chromosome character Chromosome name 1
Start integer Start position (base pair) 3301463
End integer End position (base pair) 247784114
size integer Segment size (bp) 244367069
Num_Probes integer Number of probes from GATK segment file 222.
Call character Copy number call (e.g., REF, GAIN, LOSS,GAINLOH,CNLOH) REF
ccf_COV numeric Cancer cell fraction estimated from coverage 1
ccf_MAF numeric Cancer cell fraction estimated from MAF 0
ccf_final numeric Final cancer cell fraction after refinement 1
Segment_Mean numeric Final Segment mean (log2 ratio) 0.057631093
CNF_correct numeric Purity corrected copy number estimate from coverage 2.086898584
major integer Major allele copy number 1
minor integer Minor allele copy number 1
CN integer Total copy number (major + minor) 2
MAF numeric Observed minor allele frequency 0.5
MAF_correct numeric Purity corrected minor allele frequency 0.5
expected_maf numeric Expected minor allele frequency for this configuration 0.5
expected_cov numeric Expected pseudo coverage for this segment 90
MAF_Probes integer Number of probes used for MAF calculation 1110
MAF_gmm_G integer Number of GMM clusters in MAF distribution 5
MAF_gmm_weight numeric Mixture weight of the main GMM cluster 0.667871528
BreakpointSource character Source of breakpoint (GATK or Postprocess) GATK
FILTER character Quality tag for the segment (PASS or FAILED) PASS
maf_ll numeric Log-likelihood for the observed MAF 2.625299941
MAF_likelihood numeric Posterior likelihood for this configuration 8.891628731
mu numeric Diploid coverage scale factor 0.9
rho numeric Tumor purity (fraction between 0 and 1) 0.938
index character Segment index or identifier 1
gatk_SM_raw numeric Raw segment mean from GATK -0.094372
gatk_count integer Number of counts in GATK segment 361
gatk_baselinecov numeric The GATK baseline is an intermediate value calculated using gatk_SM_raw and gatk_count. 385.4038109
gatk_gender character Gender as reported by GATK female
pipeline_gender character Gender as used in pipeline female
CN_mix character Indicator for copy number mixture (No or CN_Mix) No
Model_source character Source of model selection (Coverage, Coverage + MAF, Diploid ) Coverage + MAF

Model selection plots

Likelihood dot plot: Likelihood dot plot The plot displays the likelihood ranking for all combinations of diploid coverage scale factor and tumor purity. The vertical dashed line indicates the likelihood cutoff used to define Tier 1 models.

Model plot: Model plot The model plot displays the likelihood values of different models, which are calculated based on potential combinations of diploid coverage scale factor and tumor purity. In the plot, red indicates higher likelihood, while blue signifies lower likelihood. The light blue dot indicates the final model selected by XploR.

Tier1 Models Overall: Tier1 Models This plot shows copy number calls for each combination of diploid coverage scale factor and tumor purity. Red indicates gain, blue indicates loss, and white indicates no change. Each configuration is labeled on the y-axis. By evaluating coverage and allelic imbalance patterns in this overview, you can identify the reasonable range of diploid coverage scale factors and tumor purity values. This helps guide reruns with optimized parameter ranges if needed.

Tier1 Models Zoom in: Tier1 Models zoom in A zoomed-in view that makes the y-axis configurations more visible for detailed inspection.

QC Summary Table

sample_PASS_STAT_chr.txt ( Generated by ?BafQC() function )

Column Type Description Example_value
chrom character Chromosome name (e.g., 1, 2, …, X, Y) 1
FILTER character Segment filter status PASS
Total_segment_count integer Total number of segments on the chromosome 25
PASS_Seg_Count integer Number of segments with PASS filter status 20
PASS_Seg_Percent numeric Percentage of segments with PASS status (0–1) 0.80
Total_segment_size integer Total size (bp) of all segments on the chromosome 249250621
PASS_Seg_Size integer Total size (bp) of PASS segments on the chromosome 199400497
PASS_Seg_Size_Percent numeric Percentage of total segment size that is PASS (0–1) 0.80

Annotation file

sample_CNV_annotation.tsv ( Generated by ?AnnotateSegments() function, only unique columns are listed ).

ISCN calculation rules: 1. All segments will be reported with start and end cytoband in ISCN format. however certain considerations are made for the position of the centromere: a. In metacentric chromosomes, if a segment crosses the centromere and the gaps between the segment and the telomere on both sides are less than 5MB, only the chromosome number will be reported. b. In metacentric chromosomes, if a segment does not cross the centromere, and the gaps between the segment and the centromere and the telomere are both less than 5MB, the chromosome number followed by ‘p’ or ‘q’ will be reported. c. In acrocentric chromosomes, if the segment fulfills rule ‘b’ above, only the chromosome number will be reported.

Column Type Description Example_value
p_chromStart integer Detectable start position of p arm 10
p_chromEnd integer Detectable end position of p arm 121535434
p_first_name character Detectable name of first cytoband in p arm p36.33
p_last_name character Detectable name of last cytoband in p arm p11.2
q_chromStart integer Detectable Start position of q arm 121535435
q_chromEnd integer Detectable end position of q arm 247784114
q_first_name character Detectable name of first cytoband in q arm q11.1
q_last_name character Detectable name of last cytoband in q arm qter
p_gap_to_tel integer Gap from segment start to p arm telomere 0
p_gap_to_cen integer Gap from segment end to p arm centromere 10000
q_gap_to_tel integer Gap from segment end to q arm telomere 0
q_gap_to_cen integer Gap from segment start to q arm centromere 10000
ISCN character ISCN-style cytogenetic annotation 1p36.33-p11.2
Gene character Overlapping gene(s) in the segment TP53
Gene_count integer Number of overlapping genes 1

CNV plot

sample_CNV_plot.png ( Generated by ?RunPlotCNV() function). CNV plot: The CNV Plot shows a genome-wide summary of the copy number (top track), B-allele frequency (BAF, second track) data, tumor fraction( ccf, third tract ) and quality of segment ( bottom track). The Copy Number (CN), on the Y-axis, is a linear count of the number of copies of each chromosome in the tumor cells, taking tumor purity and tumor fraction into account. Each chromosome is plotted as a set of dots that collectively show the estimated sequence coverage for the chromosome, and as a narrow turquoise line that shows the final CN call for the chromosome. The BAF plot shows the variant allele fraction of SNPs across the genome with the same coloration used in the Copy Number plot. When the copy number of a chromosome changes, the BAF plot for an affected chromosome splits due to imbalance in chromosome counts. The variance of B-allele frequencies is quite high so the splitting of the BAF may be difficult to discern. To assist with interpreting the BAF plot, a turquoise line is drawn at the median level to show the imbalance.

Model selection and rerun

Full function and parameter list

For a complete list of all functions and their parameters, please visit the XploR function reference.

Each function page includes detailed parameter descriptions, usage examples, and links to related documentation.